skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Huang, Sanwen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, “telomere-to-telomere” genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT “Duplex” sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used “Pore-C” chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes. 
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  2. Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships. 
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  3. Abstract Effective utilization of wild relatives is key to overcoming challenges in genetic improvement of cultivated tomato, which has a narrow genetic basis; however, current efforts to decipher high-quality genomes for tomato wild species are insufficient. Here, we report chromosome-scale tomato genomes from nine wild species and two cultivated accessions, representative of Solanum section Lycopersicon , the tomato clade. Together with two previously released genomes, we elucidate the phylogeny of Lycopersicon and construct a section-wide gene repertoire. We reveal the landscape of structural variants and provide entry to the genomic diversity among tomato wild relatives, enabling the discovery of a wild tomato gene with the potential to increase yields of modern cultivated tomatoes. Construction of a graph-based genome enables structural-variant-based genome-wide association studies, identifying numerous signals associated with tomato flavor-related traits and fruit metabolites. The tomato super-pangenome resources will expedite biological studies and breeding of this globally important crop. 
    more » « less
  4. Abstract Potato ( Solanum tuberosum L.) is the world’s most important non-cereal food crop, and the vast majority of commercially grown cultivars are highly heterozygous tetraploids. Advances in diploid hybrid breeding based on true seeds have the potential to revolutionize future potato breeding and production 1–4 . So far, relatively few studies have examined the genome evolution and diversity of wild and cultivated landrace potatoes, which limits the application of their diversity in potato breeding. Here we assemble 44 high-quality diploid potato genomes from 24 wild and 20 cultivated accessions that are representative of Solanum section Petota , the tuber-bearing clade, as well as 2 genomes from the neighbouring section, Etuberosum . Extensive discordance of phylogenomic relationships suggests the complexity of potato evolution. We find that the potato genome substantially expanded its repertoire of disease-resistance genes when compared with closely related seed-propagated solanaceous crops, indicative of the effect of tuber-based propagation strategies on the evolution of the potato genome. We discover a transcription factor that determines tuber identity and interacts with the mobile tuberization inductive signal SP6A. We also identify 561,433 high-confidence structural variants and construct a map of large inversions, which provides insights for improving inbred lines and precluding potential linkage drag, as exemplified by a 5.8-Mb inversion that is associated with carotenoid content in tubers. This study will accelerate hybrid potato breeding and enrich our understanding of the evolution and biology of potato as a global staple food crop. 
    more » « less
  5. Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding. 
    more » « less
  6. The environment has constantly shaped plant genomes, but the genetic bases underlying how plants adapt to environmental influences remain largely unknown. We constructed a high-density genomic variation map of 263 geographically representative peach landraces and wild relatives. A combination of whole-genome selection scans and genome-wide environmental association studies (GWEAS) was performed to reveal the genomic bases of peach adaptation to diverse climates. A total of 2092 selective sweeps that underlie local adaptation to both mild and extreme climates were identified, including 339 sweeps conferring genomic pattern of adaptation to high altitudes. Using genome-wide environmental association studies (GWEAS), a total of 2755 genomic loci strongly associated with 51 specific environmental variables were detected. The molecular mechanism underlying adaptive evolution of high drought, strong UVB, cold hardiness, sugar content, flesh color, and bloom date were revealed. Finally, based on 30 yr of observation, a candidate gene associated with bloom date advance, representing peach responses to global warming, was identified. Collectively, our study provides insights into molecular bases of how environments have shaped peach genomes by natural selection and adds candidate genes for future studies on evolutionary genetics, adaptation to climate changes, and breeding. 
    more » « less
  7. Summary Gene‐editing techniques are currently revolutionizing biology, allowing far greater precision than previous mutagenic and transgenic approaches. They are becoming applicable to a wide range of plant species and biological processes. Gene editing can rapidly improve a range of crop traits, including disease resistance, abiotic stress tolerance, yield, nutritional quality and additional consumer traits. Unlike transgenic approaches, however, it is not facile to forensically detect gene‐editing events at the molecular level, as no foreign DNA exists in the elite line. These limitations in molecular detection approaches are likely to focus more attention on the products generated from the technology than on the process in itself. Rapid advances in sequencing and genome assembly increasingly facilitate genome sequencing as a means of characterizing new varieties generated by gene‐editing techniques. Nevertheless, subtle edits such as single base changes or small deletions may be difficult to distinguish from normal variation within a genotype. Given these emerging scenarios, downstream ‘omics’ technologies reflective of edited affects, such as metabolomics, need to be used in a more prominent manner to fully assess compositional changes in novel foodstuffs. To achieve this goal, metabolomics or ‘non‐targeted metabolite analysis’ needs to make significant advances to deliver greater representation across the metabolome. With the emergence of new edited crop varieties, we advocate: (i) concerted efforts in the advancement of ‘omics’ technologies, such as metabolomics, and (ii) an effort to redress the use of the technology in the regulatory assessment for metabolically engineered biotech crops. 
    more » « less